-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Account for memory usage in SortPreservingMerge (#5885) #6382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| MemoryConsumer::new(format!("ExternalSorterMerge[{partition_id}]")) | ||
| .register(&runtime.memory_pool); | ||
|
|
||
| merge_reservation.resize(EXTERNAL_SORTER_MERGE_RESERVATION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it as a positive sign that this was required to make the spill tests pass, without this the merge would exceed the memory limit and fail
| use tokio::task; | ||
|
|
||
| /// How much memory to reserve for performing in-memory sorts | ||
| const EXTERNAL_SORTER_MERGE_RESERVATION: usize = 10 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a massive fan of this, but this somewhat patches around the issue that once we initiate a merge we can't then spill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this approach is that even 10MB may not be enough to correctly merge the batches prior to spilling. So some queries that today would succeed (though exceed their memory limits) might fail.
It seems to me better approaches (as follow on PRs) would be:
- Make this a config parameter so users can avoid the error by reserving more memory up front if needed
- teach SortExec how to write more (smaller) spill files if it doesn't have enough memory to merge the in memory batches.
However, given the behavior on master today is to simply ignore the reservation and exceed the memory limit this behavior seems better than before.
I suggest we merge this PR as is and file a follow on ticket for the improved behavior
| fn unregister(&self, consumer: &MemoryConsumer) { | ||
| if consumer.can_spill { | ||
| self.state.lock().num_spill -= 1; | ||
| self.state.lock().num_spill.checked_sub(1).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by sanity check (as first version of MemoryReservation::split would unregister the same consumer multiple times) and the debug checks are the only reason I noticed 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth adding some unit tests to the MemoryReservation now given it is growing in sophistication
435337f to
c6542e0
Compare
c6542e0 to
d180c8d
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- I reviewed this code carefully and it makes sense to me.
However, when I ran the reproducer from https://github.com/influxdata/influxdb_iox/issues/7783 locally with this DataFusion patch IOx still exceeds memory significantly. I will update more there.
While of course, there are improvements that could be made I think it is a significant improvement.
| self.merge_reservation.free(); | ||
|
|
||
| self.in_mem_batches = self | ||
| .in_mem_sort_stream(self.metrics.baseline.intermediate())? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked that in_mem_sort_stream correctly respects self.reservation 👍
| use tokio::sync::mpsc::{Receiver, Sender}; | ||
| use tokio::task; | ||
|
|
||
| /// How much memory to reserve for performing in-memory sorts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// How much memory to reserve for performing in-memory sorts | |
| /// How much memory to reserve for performing in-memory sorts prior to spill |
| /// Reservation for in_mem_batches | ||
| reservation: MemoryReservation, | ||
| partition_id: usize, | ||
| /// Reservation for in memory sorting of batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// Reservation for in memory sorting of batches | |
| /// Reservation for in memory sorting of batches, prior to spilling. | |
| /// Without this reservation, when the memory budget is exhausted | |
| /// it might not be possible to merge the in memory batches as part | |
| /// of spilling. |
| use tokio::task; | ||
|
|
||
| /// How much memory to reserve for performing in-memory sorts | ||
| const EXTERNAL_SORTER_MERGE_RESERVATION: usize = 10 * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this approach is that even 10MB may not be enough to correctly merge the batches prior to spilling. So some queries that today would succeed (though exceed their memory limits) might fail.
It seems to me better approaches (as follow on PRs) would be:
- Make this a config parameter so users can avoid the error by reserving more memory up front if needed
- teach SortExec how to write more (smaller) spill files if it doesn't have enough memory to merge the in memory batches.
However, given the behavior on master today is to simply ignore the reservation and exceed the memory limit this behavior seems better than before.
I suggest we merge this PR as is and file a follow on ticket for the improved behavior
|
|
||
| rows: Rows, | ||
|
|
||
| #[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would help to note here in comments why the code needs to keep around a field that is never read (dead_code). I think it is to keep the reservation around long enough?
| fn unregister(&self, consumer: &MemoryConsumer) { | ||
| if consumer.can_spill { | ||
| self.state.lock().num_spill -= 1; | ||
| self.state.lock().num_spill.checked_sub(1).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth adding some unit tests to the MemoryReservation now given it is growing in sophistication
|
We have found another cause of the memory use in IOx downstream, but I still think this PR is valuable. Once we sort out downstream we'll try and get this one polished up and ready to go |
|
I plan to try and help this PR over the line in the next day or two |
|
Converted to a draft as this PR is not ready to merge yet |
|
Ok, i really do plan to pick this code up tomorrow and work on it |
|
I have a new version of this code on #7130 that I am making progress |
Which issue does this PR close?
Closes #5885
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?